Exploratory Data Analysis With Categorical Variables: An Improved Rank-by-Feature Framework and a Case Study
نویسندگان
چکیده
Multidimensional datasets often include categorical information. When most dimensions have categorical information, clustering the dataset as a whole can reveal interesting patterns in the dataset. However, the categorical information is often more useful as a way to partition the dataset: gene expression data for healthy vs. diseased samples or stock performance for common, preferred, or convertible shares. We present novel ways to utilize categorical information in exploratory data analysis by enhancing the rank-by-feature framework. First, we present ranking criteria for categorical variables and ways to improve the score overview. Second, we present a novel way to utilize the categorical information together with clustering algorithms. Users can partition the dataset according to categorical information vertically or horizontally, and the clustering result for each partition can serve as new categorical information. We report the results of a longitudinal case study with a biomedical research team, including insights gained and potential future work. Color figures are available at www.cs.umd.edu/hcil/ben60
منابع مشابه
Towards constructing an Integrative, Multi-Level Model for Cognition: The Function of Semantic Networks
Integrated approaches try to connect different constructs in different theories and reinterpret them using a common conceptual framework. In this research, using the concept of processing levels, an integrated, three-level model of the cognitive systems has been proposed and evaluated. Processing levels are divided into three categories of Feature-Oriented, Semantic and Conceptual Level based o...
متن کاملFeature Selection in Big Data by Using the enhancement of Mahalanobis–Taguchi System; Case Study, Identifiying Bad Credit clients of a Private Bank of Islamic Republic of Iran
The Mahalanobis-Taguchi System (MTS) is a relatively new collection of methods proposed for diagnosis and forecasting using multivariate data. It consists of two main parts: Part 1, the selection of useful variables in order to reduce the complexity of multi-dimensional systems and part 2, diagnosis and prediction, which are used to predict the abnormal group according to the remaining us...
متن کاملDesigning a faculty members professional care framework: a case study at Chamran University
Professional care of university faculty members plays an important role in the development of human resources of the Ministry of Science, Research and Technology. Present study was implemented to design a university's faculty members professional care framework. This research was an applied type, exploratory combined (qualitative- quantitative) by nature in which data gathering at qualitative p...
متن کاملبهکارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر همخطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان
Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...
متن کاملDetermination constructs validity of an agile organization model by using factor analysis
During 21st century, manufacturing success and survival are becoming more difficult to ensure this fact is originated in the emergency of new business era that has "change" as one of its major characteristics. Change in business environment and uncertainly have entered management study and research for the last two decades. Agility enhances the organization ability to provide high quality produ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Hum. Comput. Interaction
دوره 23 شماره
صفحات -
تاریخ انتشار 2007